Skip to content

Conversation

@sgomezvillamor
Copy link
Contributor

Fix(elasticsearch): Ingest both legacy and composable index templates

Problem:
The Elasticsearch/OpenSearch source connector previously only ingested legacy index templates (using the get_template() API). Modern OpenSearch (v2.17) and Elasticsearch (7.8+) also support composable index templates, which are accessed via a different API (get_index_template()) and have a distinct data structure. This led to incomplete index template ingestion, where only legacy templates were discovered, or none if only composable templates existed.

Solution:
This PR updates the Elasticsearch source to support both types of index templates:

  1. Dual API Calls: The get_workunits_internal method now attempts to fetch both legacy and composable index templates.
  2. Smart Parsing: The _extract_mcps method has been enhanced to correctly parse the metadata (mappings, settings, aliases) from both legacy templates (root-level fields) and composable templates (fields nested under template).

Impact:
This bug fix ensures that all index templates (both legacy and composable) are now correctly ingested from OpenSearch and Elasticsearch clusters, providing comprehensive metadata coverage. The changes are backward compatible with older clusters that only use legacy templates.

Related Issue:
CUS-6603


Slack Thread

Open in Cursor Open in Web

@cursor
Copy link

cursor bot commented Oct 23, 2025

Cursor Agent can help with this pull request. Just @cursor in comments and I'll start working on changes in this branch.
Learn more about Cursor Agents

@github-actions github-actions bot added the ingestion PR or Issue related to the ingestion of metadata label Oct 23, 2025
@codecov
Copy link

codecov bot commented Oct 23, 2025

Codecov Report

❌ Patch coverage is 3.33333% with 58 lines in your changes missing coverage. Please review.
✅ All tests successful. No failed tests found.

Files with missing lines Patch % Lines
...ion/src/datahub/ingestion/source/elastic_search.py 3.33% 58 Missing ⚠️

❌ Your patch status has failed because the patch coverage (3.33%) is below the target coverage (75.00%). You can increase the patch coverage or adjust the target coverage.

📢 Thoughts on this report? Let us know!

@sgomezvillamor sgomezvillamor changed the title Investigate OpenSearch index template ingestion issue feat(elasticsearch): support for composable index templates Oct 23, 2025
@sgomezvillamor sgomezvillamor marked this pull request as ready for review October 23, 2025 08:31
@datahub-cyborg datahub-cyborg bot added the needs-review Label for PRs that need review from a maintainer. label Oct 23, 2025
):
yield mcp.as_workunit()
except Exception as e:
logger.debug(f"Unable to fetch composable index templates: {e}")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it ok to log this as debug?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

definitely not an error as composable index templates is sort of optional

I can set warning

custom_properties["num_replicas"] = num_replicas
# 4. Construct and emit properties
if is_index:
custom_properties: Dict[str, str] = {}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't this the same as _extract_template_custom_properties?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar, but not the same.
In the case of template, it needs to handle different depending on is_composable

@datahub-cyborg datahub-cyborg bot added pending-submitter-merge and removed needs-review Label for PRs that need review from a maintainer. labels Oct 28, 2025
@sgomezvillamor sgomezvillamor merged commit 33d36dc into master Oct 29, 2025
62 of 63 checks passed
@sgomezvillamor sgomezvillamor deleted the cursor/investigate-opensearch-index-template-ingestion-issue-5643 branch October 29, 2025 09:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ingestion PR or Issue related to the ingestion of metadata pending-submitter-merge

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants